This paper examines the encoding of analogy in large-scale pretrained language models, such as BERT and GPT-2. Existing analogy datasets typically focus on a limited set of analogical relations, with a high similarity of the two domains between which the analogy holds. As a more realistic setup, we introduce the Scientific and Creative Analogy dataset (SCAN), a novel analogy dataset containing systematic mappings of multiple attributes and relational structures across dissimilar domains. Using this dataset, we test the analogical reasoning capabilities of several widely-used pretrained language models (LMs). We find that state-of-the-art LMs achieve low performance on these complex analogy tasks, highlighting the challenges still posed by analogy understanding.
translated by 谷歌翻译
It has been experimentally demonstrated that humans are able to learn in a manner that allows them to make predictions on categories for which they have not seen any examples (Malaviya et al., 2022). Sucholutsky and Schonlau (2020) have recently presented a machine learning approach that aims to do the same. They utilise synthetically generated data and demonstrate that it is possible to achieve sub-linear scaling and develop models that can learn to recognise N classes from M training samples where M is less than N - aka less-than-one shot learning. Their method was, however, defined for univariate or simple multivariate data (Sucholutsky et al., 2021). We extend it to work on large, high-dimensional and real-world datasets and empirically validate it in this new and challenging setting. We apply this method to learn previously unseen NLP tasks from very few examples (4, 8 or 16). We first generate compact, sophisticated less-than-one shot representations called soft-label prototypes which are fitted on training data, capturing the distribution of different classes across the input domain space. We then use a modified k-Nearest Neighbours classifier to demonstrate that soft-label prototypes can classify data competitively, even outperforming much more computationally complex few-shot learning methods.
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
Missing values are a common problem in data science and machine learning. Removing instances with missing values can adversely affect the quality of further data analysis. This is exacerbated when there are relatively many more features than instances, and thus the proportion of affected instances is high. Such a scenario is common in many important domains, for example, single nucleotide polymorphism (SNP) datasets provide a large number of features over a genome for a relatively small number of individuals. To preserve as much information as possible prior to modeling, a rigorous imputation scheme is acutely needed. While Denoising Autoencoders is a state-of-the-art method for imputation in high-dimensional data, they still require enough complete cases to be trained on which is often not available in real-world problems. In this paper, we consider missing value imputation as a multi-label classification problem and propose Chains of Autoreplicative Random Forests. Using multi-label Random Forests instead of neural networks works well for low-sampled data as there are fewer parameters to optimize. Experiments on several SNP datasets show that our algorithm effectively imputes missing values based only on information from the dataset and exhibits better performance than standard algorithms that do not require any additional information. In this paper, the algorithm is implemented specifically for SNP data, but it can easily be adapted for other cases of missing value imputation.
translated by 谷歌翻译
During training, reinforcement learning systems interact with the world without considering the safety of their actions. When deployed into the real world, such systems can be dangerous and cause harm to their surroundings. Often, dangerous situations can be mitigated by defining a set of rules that the system should not violate under any conditions. For example, in robot navigation, one safety rule would be to avoid colliding with surrounding objects and people. In this work, we define safety rules in terms of the relationships between the agent and objects and use them to prevent reinforcement learning systems from performing potentially harmful actions. We propose a new safe epsilon-greedy algorithm that uses safety rules to override agents' actions if they are considered to be unsafe. In our experiments, we show that a safe epsilon-greedy policy significantly increases the safety of the agent during training, improves the learning efficiency resulting in much faster convergence, and achieves better performance than the base model.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
我们提出了Rudsi,这是俄罗斯语言感官诱导(WSI)的新基准。该数据集是使用单词用法图(WUGS)的手动注释和半自动聚类创建的。与俄罗斯的先前WSI数据集不同,Rudsi完全由数据驱动(基于俄罗斯国家语料库的文本),没有对注释者强加的外部词感官。根据图聚类的参数,可以从原始注释中产生不同的导数数据集。我们报告了几种基线WSI方法在Rudsi上获得的性能,并讨论了改善这些分数的可能性。
translated by 谷歌翻译
为了在高移动性虚拟环境中实现柔软物体的高富度触觉渲染,我们提出了一种新颖的触觉显示dandeliontouch。一群无人机将触觉执行器传递给用户的指尖。 DandelionTouch的用户能够在不受设备工作区域限制的大空间中体验触觉反馈。重要的是,在与虚拟物体的长时间互动中,他们不会经历肌肉疲劳。手动跟踪和群控制算法允许用手动运动引导群,并避免在编队内部发生冲突。在这项研究中,研究了群体之间的阻抗连接的几种拓扑结构。该实验在实时在正方形轨迹上执行了一个遵循的实验,该实验表明,在恒星拓扑中连接的无人机执行了平均位置误差较低的轨迹(与其他阻抗拓扑相比,RMSE降低了20.6 \%与潜在的基于现场的群体控制相比,为40.9 \%。在所有具有阻抗行为的地层中,无人机的达到的速度比通过潜在场算法控制的群体高28%。此外,在与7名参与者的用户研究中评估了几种纤维骨架模式的感知。该研究表明,提议的时间延迟和频率调制的组合使用户可以同时成功识别VR中的表面特性和运动方向(平均识别率为70 \%,最大为93 \%)。 DandelionTouch建议在VR系统中提出一种新型的触觉反馈,无需手持或可穿戴界面。
translated by 谷歌翻译
相干显微镜技术提供了跨科学和技术领域的材料的无与伦比的多尺度视图,从结构材料到量子设备,从综合电路到生物细胞。在构造更明亮的来源和高速探测器的驱动下,连贯的X射线显微镜方法(如Ptychography)有望彻底改变纳米级材料的特征。但是,相关的数据和计算需求显着增加意味着,常规方法不再足以从高速相干成像实验实时恢复样品图像。在这里,我们演示了一个工作流程,该工作流利用边缘的人工智能和高性能计算,以实现直接从检测器直接从检测器流出的X射线ptychography数据实时反演。拟议的AI支持的工作流程消除了传统的Ptychography施加的采样约束,从而使用比传统方法所需的数据较少的数据级允许低剂量成像。
translated by 谷歌翻译
现代时间域的光度测验收集了许多天文学对象的观察结果,大规模调查的即将到来的时代将提供更多信息。大多数对象从未接受过光谱随访,这对于瞬态尤其至关重要。超新星。在这种情况下,观察到的光曲线可以提供负担得起的替代方案。时间序列被积极用于光度分类和表征,例如峰值和光度下降估计。但是,收集的时间序列是多维的,不规则地采样,包含异常值,并且没有明确定义的系统不确定性。机器学习方法有助于以最有效的方式从可用数据中提取有用的信息。我们考虑了基于神经网络的几种光曲线近似方法:多层感知,贝叶斯神经网络以及使流量正常化,以近似单光曲线观察。使用模拟的Parperc和Real Zwicky瞬态设施数据样本的测试表明,即使很少有观察值足以拟合网络并获得比其他最新方法更好的近似质量。我们表明,这项工作中描述的方法具有比高斯流程更快的计算复杂性和更快的工作速度。我们分析了旨在填补光曲线观察中空白的近似技术的性能,并表明使用适当的技术会提高峰值发现和超新星分类的准确性。此外,研究结果是在GitHub上可用的Fulu Python库中组织的,该库可以很容易地由社区使用。
translated by 谷歌翻译